Objective: Social Determinants of Health (SDOH) influence personal health outcomes and health systems interactions. Health systems capture SDOH information through structured data and unstructured clinical notes; however, clinical notes often contain a more comprehensive representation of several key SDOH. The objective of this work is to assess the SDOH information gain achievable by extracting structured semantic representations of SDOH from the clinical narrative and combining these extracted representations with available structured data. Materials and Methods: We developed a natural language processing (NLP) information extraction model for SDOH that utilizes a deep learning entity and relation extraction architecture. In an electronic health record (EHR) case study, we applied the SDOH extractor to a large existing clinical data set with over 200,000 patients and 400,000 notes and compared the extracted information with available structured data. Results: The SDOH extractor achieved 0.86 F1 on a withheld test set. In the EHR case study, we found 19\% of current tobacco users, 10\% of drug users, and 32\% of homeless patients only include documentation of these risk factors in the clinical narrative. Conclusions: Patients who are at-risk for negative health outcomes due to SDOH may be better served if health systems are able to identify SDOH risk factors and associated social needs. Structured semantic representations of text-encoded SDOH information can augment existing structured, and this more comprehensive SDOH representation can assist health systems in identifying and addressing social needs.
translated by 谷歌翻译
Although machine learning (ML) models of AI achieve high performances in medicine, they are not free of errors. Empowering clinicians to identify incorrect model recommendations is crucial for engendering trust in medical AI. Explainable AI (XAI) aims to address this requirement by clarifying AI reasoning to support the end users. Several studies on biomedical imaging achieved promising results recently. Nevertheless, solutions for models using tabular data are not sufficient to meet the requirements of clinicians yet. This paper proposes a methodology to support clinicians in identifying failures of ML models trained with tabular data. We built our methodology on three main pillars: decomposing the feature set by leveraging clinical context latent space, assessing the clinical association of global explanations, and Latent Space Similarity (LSS) based local explanations. We demonstrated our methodology on ML-based recognition of preterm infant morbidities caused by infection. The risk of mortality, lifelong disability, and antibiotic resistance due to model failures was an open research question in this domain. We achieved to identify misclassification cases of two models with our approach. By contextualizing local explanations, our solution provides clinicians with actionable insights to support their autonomy for informed final decisions.
translated by 谷歌翻译
End-to-end multilingual ASR has become more appealing because of several reasons such as simplifying the training and deployment process and positive performance transfer from high-resource to low-resource languages. However, scaling up the number of languages, total hours, and number of unique tokens is not a trivial task. This paper explores large-scale multilingual ASR models on 70 languages. We inspect two architectures: (1) Shared embedding and output and (2) Multiple embedding and output model. In the shared model experiments, we show the importance of tokenization strategy across different languages. Later, we use our optimal tokenization strategy to train multiple embedding and output model to further improve our result. Our multilingual ASR achieves 13.9%-15.6% average WER relative improvement compared to monolingual models. We show that our multilingual ASR generalizes well on an unseen dataset and domain, achieving 9.5% and 7.5% WER on Multilingual Librispeech (MLS) with zero-shot and finetuning, respectively.
translated by 谷歌翻译
随着计算机视觉应用程序的最新增长,尚未探索它们的公平和公正性问题。有大量证据表明,训练数据中存在的偏差反映在模型中,甚至放大。图像数据集的许多以前的方法偏见,包括基于增强数据集的模型,在计算上实现的计算昂贵。在这项研究中,我们提出了一个快速有效的模型,以通过重建并最大程度地减少预期变量之间的统计依赖性来消除图像数据集。我们的体系结构包括重建图像的U-NET,并结合了预先训练的分类器,该分类器会惩罚目标属性和受保护属性之间的统计依赖性。我们在Celeba数据集上评估了我们提出的模型,将结果与最先进的偏见方法进行比较,并证明该模型实现了有希望的公平性 - 精确性组合。
translated by 谷歌翻译
神经网络修剪可以有效地用于压缩自动语音识别(ASR)模型。但是,在多语言ASR中,执行语言不足的修剪可能会导致某些语言的严重性能降解,因为语言 - 敏捷的修剪口罩可能不符合所有语言,并丢弃了重要的语言特定参数。在这项工作中,我们提出了ASR路径,这是一种稀疏的多语言ASR模型,该模型激活了特定语言的子网络(“路径”),从而明确地学习了每种语言的参数。通过重叠的子网络,共享参数还可以通过联合多语言培训来实现较低资源语言的知识传输。我们提出了一种新型算法来学习ASR途径,并通过流式RNN-T模型评估了4种语言的建议方法。我们提出的ASR途径的表现都优于密集模型(平均-5.0%)和语言不足的修剪模型(平均-21.4%),并且与单语稀疏模型相比,低资源语言的性能更好。
translated by 谷歌翻译
根据诸如医疗条件,程序和药物使用之类的资格标准,识别患者队列对于临床试验的招募至关重要。这种标准通常是在自由文本中最自然地描述的,使用临床医生和研究人员熟悉的语言。为了大规模识别潜在参与者,必须首先将这些标准转换为临床数据库的查询,这可能是劳动密集型且容易出错的。自然语言处理(NLP)方法提供了一种可能自动转换为数据库查询的潜在手段。但是,必须首先使用Corpora对其进行培训和评估,该语料库详细列出临床试验标准。在本文中,我们介绍了叶片临床试验(LCT)语料库,该语料库是一种使用高度颗粒状结构化标签,捕获一系列生物医学现象的人类向超过1000个临床试验资格标准描述。我们提供了我们的模式,注释过程,语料库质量和统计数据的详细信息。此外,我们提出了该语料库的基线信息提取结果,作为未来工作的基准。
translated by 谷歌翻译
越来越有兴趣将流和全文自动语音识别(ASR)网络统一到单个端到端ASR模型中,以简化两种用例的模型培训和部署。在现实世界中的ASR应用程序中,流媒体ASR模型通常在更多的存储和计算约束(例如,在嵌入式设备上)进行操作,而不是任何服务器端的全文模型。由Omni-Sparsity Supernet训练的最新进展激发,该训练在一个单个模型中共同优化了多个子网,该工作旨在共同学习紧凑的稀疏稀疏式磁性流媒体流动ASR模型,以及一个大型密度服务器非流动模型,在一个超级网。接下来,我们提出,在两种WAV2VEC 2.0自制学习和监督的ASR微调上进行超网训练不仅可以基本上改善先前工作中所示的大型非流式模型,还可以改善紧凑的稀疏流流媒体流模型。
translated by 谷歌翻译
我们提出了一种基于审议的新型方法来端到端(E2E)口语理解(SLU),其中流媒体自动语音识别(ASR)模型会产生第一频繁的假设和第二通通的自然语言(NLU)(NLU) )组件通过对ASR的文本和音频嵌入来生成语义解析。通过将E2E SLU制定为广义解码器,我们的系统能够支持复杂的组成语义结构。此外,ASR和NLU之间的参数共享使该系统特别适合资源受限的(内部设备)环境;我们提出的方法始终在TOPV2数据集的口头版本(Stop)的口语版本上始终优于强大管道NLU基线的0.60%至0.65%。我们证明了文本和音频功能的融合,再加上系统重写第一通道假设的能力,使我们的方法对ASR错误更加强大。最后,我们表明我们的方法可以显着减少从自然语音到合成语音训练时的降解,但是要使文本到语音(TTS)成为可行的解决方案,以扩大E2E SLU。
translated by 谷歌翻译
随着无线标准的发展,引入了更复杂的功能,以解决吞吐量,延迟,安全性和效率方面的增加。为了释放此类新功能的潜力,目前正在利用人工智能(AI)和机器学习(ML)(ML)来从数据中得出模型和协议,而不是通过手工编程。在本文中,我们探讨了将ML应用于下一代无线局域网(WLAN)的可行性。更具体地说,我们专注于IEEE 802.11AX空间重用(SR)问题,并通过联合学习(FL)模型来预测其性能。在这项工作中概述的FL解决方案集是2021年国际电信联盟(ITU)AI的5G挑战赛的一部分。
translated by 谷歌翻译
450万小时的英语演讲从10个不同的10个不同来源,跨越高达10亿参数的不同来源,我们探索了自动语音识别的规模前沿。我们提出了数据选择技术,以有效地缩放培训数据,以找到大规模数据集中最有价值的样本。为了有效地进行模型尺寸,我们利用各种优化,例如稀疏传感器丢失和模型分片。通过培训1-10B参数通用英语ASR模型,我们将语音识别性能的限制推动在许多域中。此外,我们的模型学习强大的语音表示,在新域名和言语方面具有零和少量功能,超出了多个内部和公共基准的先前结果。对于由于脑损伤而具有障碍的扬声器,我们最好的零射击和少量射频分别在Aphasiabank测试集中实现了22%和60%,同时在公共社交媒体视频中实现了最佳性能。此外,相同的通用模型在SPGISPeech Financial-Domain数据集上达到了500倍的域内数据等效性能。
translated by 谷歌翻译